Skip to content

496: feat: Replace from_avro_datum* functions with GenericDatumReader#61

Open
martin-augment wants to merge 1 commit intomainfrom
pr-496-2026-03-05-07-33-20
Open

496: feat: Replace from_avro_datum* functions with GenericDatumReader#61
martin-augment wants to merge 1 commit intomainfrom
pr-496-2026-03-05-07-33-20

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

496: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

Walkthrough

This pull request refactors the Avro datum reading API by introducing a new GenericDatumReader<'s> struct with a builder pattern in a dedicated reader::datum module. The reader module is made public at the crate root. The previously exported convenience functions (from_avro_datum variants) are relocated from reader/mod.rs to reader/datum.rs and marked as deprecated, while remaining available for backward compatibility. Tests across multiple files are updated to use the new GenericDatumReader builder API instead of the deprecated functions. Additionally, ResolvedSchema<'s> gains Clone derivation to support cloning operations.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-496-2026-03-05-07-33-20

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Avro datum reading mechanism by introducing a new GenericDatumReader struct. This new reader provides a more structured and flexible approach to deserializing Avro data, especially when dealing with schema resolution and complex configurations. The existing top-level from_avro_datum functions have been deprecated and re-implemented internally using the GenericDatumReader, ensuring backward compatibility while guiding users towards the new, more robust API. This change improves the maintainability and extensibility of the Avro reading capabilities.

Highlights

  • New GenericDatumReader: Introduced a new GenericDatumReader struct and its associated builder pattern for more flexible and structured Avro datum reading.
  • Deprecated Functions: The existing top-level from_avro_datum, from_avro_datum_schemata, and from_avro_datum_reader_schemata functions have been deprecated.
  • Re-implementation: The deprecated from_avro_datum* functions have been re-implemented internally to utilize the new GenericDatumReader, ensuring backward compatibility while promoting the new API.
  • Module Restructuring: Datum reading logic has been moved into a new dedicated module: avro/src/reader/datum.rs.
  • Test Updates: All relevant internal and external tests have been updated to use the new GenericDatumReader for Avro datum deserialization.
  • Schema Cloning: The ResolvedSchema struct now derives the Clone trait, enhancing its usability.
Changelog
  • avro/src/lib.rs
    • Moved reader module declaration to a new location
    • Updated pub use statements for from_avro_datum functions to reference the new reader::datum module and marked them as deprecated
    • Updated test imports to use GenericDatumReader
    • Modified test_illformed_length to use GenericDatumReader and return TestResult
  • avro/src/reader/datum.rs
    • Added new module containing the GenericDatumReader struct and its builder
    • Implemented read_value method for GenericDatumReader
    • Re-implemented and deprecated from_avro_datum, from_avro_datum_schemata, and from_avro_datum_reader_schemata using GenericDatumReader
    • Included unit tests for GenericDatumReader
  • avro/src/reader/mod.rs
    • Moved datum module to its own file
    • Removed the old implementations of from_avro_datum, from_avro_datum_schemata, and from_avro_datum_reader_schemata
    • Removed unused imports related to decode and ResolvedSchema
    • Removed several test cases that were moved to reader::datum
  • avro/src/schema/mod.rs
    • Updated test imports to include GenericDatumReader
    • Replaced calls to crate::from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/src/schema/resolve.rs
    • Derived Clone for the ResolvedSchema struct
  • avro/src/serde/de.rs
    • Updated test imports to include GenericDatumReader
    • Replaced calls to crate::from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/src/writer/datum.rs
    • Updated test imports to include GenericDatumReader
    • Replaced calls to crate::from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/tests/avro-3786.rs
    • Updated imports to use reader::datum::GenericDatumReader
    • Replaced calls to from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/tests/avro-3787.rs
    • Updated imports to use reader::datum::GenericDatumReader
    • Replaced calls to from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/tests/io.rs
    • Updated imports to use reader::datum::GenericDatumReader
    • Replaced calls to from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/tests/schema.rs
    • Updated imports to include reader::datum::GenericDatumReader
    • Replaced calls to from_avro_datum with GenericDatumReader::builder().build()?.read_value() in tests
  • avro/tests/to_from_avro_datum_schemata.rs
    • Updated imports to include reader::datum::GenericDatumReader
    • Removed direct imports of from_avro_datum_reader_schemata and from_avro_datum_schemata
    • Replaced calls to from_avro_datum_schemata and from_avro_datum_reader_schemata with GenericDatumReader::builder().build()?.read_value() in tests
Activity
  • The pull request was created and is awaiting review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

} else {
Ok(value)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing enclosing namespace in reader schema resolution

High Severity

read_value passes None as the enclosing_namespace argument to resolve_internal, but the old from_avro_datum code called value.resolve(schema) which internally calls resolve_schemata, which passes schema.namespace() as the enclosing namespace. This means schemas with namespaces that contain Schema::Ref entries relying on the enclosing namespace for fully-qualified name resolution will fail to resolve correctly. The third argument in the value.resolve_internal(reader, resolved.get_names(), None, &None) call needs to be reader.namespace() instead of None.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:incorrect-but-reasonable; category:bug; feedback: The Bugbot AI reviewer is not correct! It does not matter whether the namespace for the top-level Schema will be None or schema.namespace(). The implementation falls back to schema.namespace() if the enclosing is None. So, it behaves the same.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 5, 2026

🤖 Augment PR Summary

Summary: This PR replaces the old standalone from_avro_datum* decoding helpers with a builder-based GenericDatumReader API.

Changes:

  • Adds avro::reader::datum with GenericDatumReader and moves the deprecated from_avro_datum* functions there
  • Makes the reader module public and updates crate-level re-exports to preserve existing entry points (with deprecation expectations)
  • Updates internal unit tests and integration tests to decode using GenericDatumReader instead of from_avro_datum
  • Derives Clone for ResolvedSchema to support passing pre-resolved schemata into the new reader

Technical Notes: Decoding now consistently goes through decode_internal + optional schema resolution, and the previous raw-datum helpers are retained as deprecated wrappers for compatibility.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

pub fn read_value<R: Read>(&self, reader: &mut R) -> AvroResult<Value> {
let value = decode_internal(self.writer, self.resolved.get_names(), None, reader)?;
if let Some((reader, resolved)) = &self.reader {
value.resolve_internal(reader, resolved.get_names(), None, &None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenericDatumReader::read_value calls value.resolve_internal(..., None, &None), but Value::resolve(_schemata) passes schema.namespace() as enclosing_namespace. This difference could break resolution for unqualified Schema::Ref names in the reader schema (especially for programmatically-constructed schemas).

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:incorrect-but-reasonable; category:bug; feedback: The Augment AI reviewer is not correct! It does not matter whether the namespace for the top-level Schema will be None or schema.namespace(). The implementation falls back to schema.namespace() if the enclosing is None. So, it behaves the same.

/// **NOTE** This function has a quite small niche of usage and does NOT take care of reading the
/// header and consecutive data blocks; use [`Reader`](struct.Reader.html) if you don't know what
/// you are doing, instead.
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation notes (and the doc comment above) point users to DatumReader, but there doesn’t appear to be any DatumReader type/trait exported in this crate; this may confuse users trying to follow the migration guidance.

Severity: low

Other Locations
  • avro/src/reader/datum.rs:37
  • avro/src/reader/datum.rs:176
  • avro/src/reader/datum.rs:208

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

@claude
Copy link
Copy Markdown

claude bot commented Mar 5, 2026

PR 61 Code Review - see details below

@claude
Copy link
Copy Markdown

claude bot commented Mar 5, 2026

Bug: In read_value (datum.rs:122), resolve_internal is called with None as enclosing_namespace. The old from_avro_datum used value.resolve(schema) which passes schema.namespace() as enclosing_namespace. For schemas with a namespace, this change could break reference resolution. Suggested fix: pass reader_schema.namespace() instead of None. Warning: All 3 deprecated functions say note=Use DatumReader instead but the type is GenericDatumReader. Minor: datum.rs:119 Read a Avro -> Read an Avro. Positives: consistent API with GenericDatumWriter, ResolvedSchema pre-computed at build time, good use of bon type-state pattern for reader_schemata ordering.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@avro/src/reader/datum.rs`:
- Line 37: Update doc comments and deprecation notes to reference the real
public type GenericDatumReader and use intra-doc links: replace the reference
"DatumReader" in the doc comment that builds a DatumReader with
"GenericDatumReader" (symbol: GenericDatumReader), change the brittle HTML link
"struct.Reader.html" to an intra-doc link "crate::Reader" (symbol:
crate::Reader), and update all deprecation notes that say "Use `DatumReader`
instead" to "Use `GenericDatumReader` instead" (symbols: deprecation blocks near
the original lines referencing DatumReader/GenericDatumReader) so docs point to
the correct type and use stable intra-doc syntax.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 60a47493-3968-4fa4-8c60-a13c1a7fe460

📥 Commits

Reviewing files that changed from the base of the PR and between aa6255a and 07a261f.

📒 Files selected for processing (12)
  • avro/src/lib.rs
  • avro/src/reader/datum.rs
  • avro/src/reader/mod.rs
  • avro/src/schema/mod.rs
  • avro/src/schema/resolve.rs
  • avro/src/serde/de.rs
  • avro/src/writer/datum.rs
  • avro/tests/avro-3786.rs
  • avro/tests/avro-3787.rs
  • avro/tests/io.rs
  • avro/tests/schema.rs
  • avro/tests/to_from_avro_datum_schemata.rs


#[bon]
impl<'s> GenericDatumReader<'s> {
/// Build a [`DatumReader`].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Verify whether a DatumReader symbol exists in the crate.
rg -nP '\b(pub\s+type\s+DatumReader|struct\s+DatumReader|enum\s+DatumReader|trait\s+DatumReader)\b' avro/src || true

# 2) Locate all references that may need update.
rg -n 'DatumReader|struct\.Reader\.html' avro/src/reader/datum.rs avro/src/lib.rs

Repository: martin-augment/avro-rs

Length of output: 2041


🏁 Script executed:

#!/bin/bash
# Read the file and show context around the mentioned lines
sed -n '30,50p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '140,160p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '170,185p' avro/src/reader/datum.rs | cat -n
echo "---"
sed -n '200,215p' avro/src/reader/datum.rs | cat -n

Repository: martin-augment/avro-rs

Length of output: 3451


Update documentation and deprecation notes to reference GenericDatumReader instead of non-existent DatumReader, and replace brittle doc link with intra-doc syntax.

The codebase does not define a DatumReader type. References in docs and deprecation notes should point to GenericDatumReader, which is the actual public type. Additionally, the documentation link at line 145 uses brittle struct.Reader.html syntax and should use the intra-doc link crate::Reader instead.

Affected locations:

  • Line 37: Doc comment references DatumReader
  • Line 145: Link uses struct.Reader.html
  • Lines 147, 176, 208: Deprecation notes all say "Use DatumReader instead"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@avro/src/reader/datum.rs` at line 37, Update doc comments and deprecation
notes to reference the real public type GenericDatumReader and use intra-doc
links: replace the reference "DatumReader" in the doc comment that builds a
DatumReader with "GenericDatumReader" (symbol: GenericDatumReader), change the
brittle HTML link "struct.Reader.html" to an intra-doc link "crate::Reader"
(symbol: crate::Reader), and update all deprecation notes that say "Use
`DatumReader` instead" to "Use `GenericDatumReader` instead" (symbols:
deprecation blocks near the original lines referencing
DatumReader/GenericDatumReader) so docs point to the correct type and use stable
intra-doc syntax.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the from_avro_datum* functions into a new GenericDatumReader with a builder pattern, which improves the API. The changes are applied consistently across the codebase, including tests. I've found a few minor documentation typos and a logic bug in one of the refactored deprecated functions. Please see my comments for details.

Note: Security Review did not run due to the size of the PR.

Comment on lines +216 to +221
GenericDatumReader::builder(writer_schema)
.writer_schemata(writer_schemata)?
.maybe_reader_schema(reader_schema)
.reader_schemata(reader_schemata)?
.build()?
.read_value(reader)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation of from_avro_datum_reader_schemata seems to have a logic issue. The call to .reader_schemata(reader_schemata)? is unconditional, but it requires reader_schema to have been set, which is not guaranteed as it's an Option. If reader_schema is None, maybe_reader_schema will not set it, and the subsequent call to reader_schemata will lead to a compile error due to the where S::ReaderSchema: generic_datum_reader_builder::IsSet bound.

The logic should be conditional, applying reader_schemata only when reader_schema is Some. The previous implementation also ignored reader_schemata when reader_schema was None.

Suggested change
GenericDatumReader::builder(writer_schema)
.writer_schemata(writer_schemata)?
.maybe_reader_schema(reader_schema)
.reader_schemata(reader_schemata)?
.build()?
.read_value(reader)
let builder = GenericDatumReader::builder(writer_schema)
.writer_schemata(writer_schemata)?;
if let Some(rs) = reader_schema {
builder
.reader_schema(rs)
.reader_schemata(reader_schemata)?
.build()?
.read_value(reader)
} else {
builder.build()?.read_value(reader)
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The builder API generated by "bon" crate seems to have a hole! By using the maybe_xyz() method the type system based checks might get confused that 'xyz' is set but actually if it is None then it won't be set. Prevents a bug that sets the reader_schemata when the reader_schema may not be set to a non-None value.


#[bon]
impl<'s> GenericDatumReader<'s> {
/// Build a [`DatumReader`].
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo in the documentation. It should refer to GenericDatumReader instead of DatumReader to match the struct name.

Suggested change
/// Build a [`DatumReader`].
/// Build a [`GenericDatumReader`].

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

/// **NOTE** This function has a quite small niche of usage and does NOT take care of reading the
/// header and consecutive data blocks; use [`Reader`](struct.Reader.html) if you don't know what
/// you are doing, instead.
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The deprecation note refers to DatumReader, but it should be GenericDatumReader to be consistent with the new implementation.

Suggested change
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
#[deprecated(since = "0.22.0", note = "Use `GenericDatumReader` instead")]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

/// schemata to resolve any dependencies.
///
/// When a reader `Schema` is provided, schema resolution will also be performed.
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The deprecation note refers to DatumReader, but it should be GenericDatumReader to be consistent with the new implementation.

Suggested change
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
#[deprecated(since = "0.22.0", note = "Use `GenericDatumReader` instead")]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

/// schemata to resolve any dependencies.
///
/// When a reader `Schema` is provided, schema resolution will also be performed.
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The deprecation note refers to DatumReader, but it should be GenericDatumReader to be consistent with the new implementation.

Suggested change
#[deprecated(since = "0.22.0", note = "Use `DatumReader` instead")]
#[deprecated(since = "0.22.0", note = "Use `GenericDatumReader` instead")]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Gemini AI reviewer is correct! There is no DatumReader struct or trait. The correct name is GenericDatumReader. All references to DatumReader should be corrected. Prevents releasing documentation with broken links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant